.\" Copyright (c) 2005 Nate Lawson
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.Dd April 4, 2022
.Dt CPUFREQ 4
.Os
.Sh NAME
.Nm cpufreq
.Nd CPU frequency control framework
.Sh SYNOPSIS
.Cd "device cpufreq"
.Pp
.In sys/cpu.h
.Ft int
.Fn cpufreq_levels "device_t dev" "struct cf_level *levels" "int *count"
.Ft int
.Fn cpufreq_set "device_t dev" "const struct cf_level *level" "int priority"
.Ft int
.Fn cpufreq_get "device_t dev" "struct cf_level *level"
.Ft int
.Fo cpufreq_drv_settings
.Fa "device_t dev"
.Fa "struct cf_setting *sets"
.Fa "int *count"
.Fc
.Ft int
.Fn cpufreq_drv_type "device_t dev" "int *type"
.Ft int
.Fn cpufreq_drv_set "device_t dev" "const struct cf_setting *set"
.Ft int
.Fn cpufreq_drv_get "device_t dev" "struct cf_setting *set"
.Sh DESCRIPTION
The
.Nm
driver provides a unified kernel and user interface to CPU frequency
control drivers.
It combines multiple drivers offering different settings into a single
interface of all possible levels.
Users can access this interface directly via
.Xr sysctl 8
or by indicating to
.Pa /etc/rc.d/power_profile
that it should switch settings when the AC line state changes via
.Xr rc.conf 5 .
.Sh SYSCTL VARIABLES
These settings may be overridden by kernel drivers requesting alternate
settings.
If this occurs, the original values will be restored once the condition
has passed (e.g., the system has cooled sufficiently).
If a sysctl cannot be set due to an override condition, it will return
.Er EPERM .
.Pp
The frequency cannot be changed if TSC is in use as the timecounter and the
hardware does not support invariant TSC.
This is because the timecounter system needs to use a source that has a
constant rate.
(On invariant TSC hardware, the TSC runs at the P0 rate regardless of the
configured P-state.)
Modern hardware mostly has invariant TSC.
The timecounter source can be changed with the
.Pa kern.timecounter.hardware
sysctl.
Available modes are in
.Pa kern.timecounter.choice
sysctl entry.
.Bl -tag -width indent
.It Va dev.cpu.%d.freq
Current active CPU frequency in MHz.
.It Va dev.cpu.%d.freq_driver
The specific
.Nm
driver used by this cpu.
.It Va dev.cpu.%d.freq_levels
Currently available levels for the CPU (frequency/power usage).
Values are in units of MHz and milliwatts.
.It Va dev.DEVICE.%d.freq_settings
Currently available settings for the driver (frequency/power usage).
Values are in units of MHz and milliwatts.
This is helpful for understanding which settings are offered by which
driver for debugging purposes.
.It Va debug.cpufreq.lowest
Lowest CPU frequency in MHz to offer to users.
This setting is also accessible via a tunable with the same name.
This can be used to disable very low levels that may be unusable on
some systems.
.It Va debug.cpufreq.verbose
Print verbose messages.
This setting is also accessible via a tunable with the same name.
.It Va debug.hwpstate_pstate_limit
If enabled, the AMD hwpstate driver limits administrative control of P-states
(including by
.Xr powerd 8 )
to the value in the 0xc0010061 MSR, known as "PStateCurLim[CurPstateLimit]."
It is disabled (0) by default.
On some hardware, the limit register seems to simply follow the configured
P-state, which results in the inability to ever raise the P-state back to P0
from a reduced frequency state.
.El
.Sh SUPPORTED DRIVERS
The following device drivers offer absolute frequency control via the
.Nm
interface.
Usually, only one of these can be active at a time.
.Pp
.Bl -tag -compact -width "hwpstate_intel(4)"
.It acpi_perf
ACPI CPU performance states
.It Xr est 4
Intel Enhanced SpeedStep
.It hwpstate
AMD Cool'n'Quiet2 used in K10 through Family 17h
.It Xr hwpstate_intel 4
Intel SpeedShift driver
.It ichss
Intel SpeedStep for ICH
.It powernow
AMD PowerNow!\& and Cool'n'Quiet for K7 and K8
.It smist
Intel SMI-based SpeedStep for PIIX4
.El
.Pp
The following device drivers offer relative frequency control and
have an additive effect:
.Pp
.Bl -tag -compact -width "acpi_throttle"
.It acpi_throttle
ACPI CPU throttling
.It p4tcc
Pentium 4 Thermal Control Circuitry
.El
.Sh KERNEL INTERFACE
Kernel components can query and set CPU frequencies through the
.Nm
kernel interface.
This involves obtaining a
.Nm
device, calling
.Fn cpufreq_levels
to get the currently available frequency levels,
checking the current level with
.Fn cpufreq_get ,
and setting a new one from the list with
.Fn cpufreq_set .
Each level may actually reference more than one
.Nm
driver but kernel components do not need to be aware of this.
The
.Va total_set
element of
.Vt "struct cf_level"
provides a summary of the frequency and power for this level.
Unknown or irrelevant values are set to
.Dv CPUFREQ_VAL_UNKNOWN .
.Pp
The
.Fn cpufreq_levels
method takes a
.Nm
device and an empty array of
.Fa levels .
The
.Fa count
value should be set to the number of levels available and after the
function completes, will be set to the actual number of levels returned.
If there are more levels than
.Fa count
will allow, it should return
.Er E2BIG .
.Pp
The
.Fn cpufreq_get
method takes a pointer to space to store a
.Fa level .
After successful completion, the output will be the current active level
and is equal to one of the levels returned by
.Fn cpufreq_levels .
.Pp
The
.Fn cpufreq_set
method takes a pointer a
.Fa level
and attempts to activate it.
The
.Fa priority
(i.e.,
.Dv CPUFREQ_PRIO_KERN )
tells
.Nm
whether to override previous settings while activating this level.
If
.Fa priority
is higher than the current active level, that level will be saved and
overridden with the new level.
If a level is already saved, the new level is set without overwriting
the older saved level.
If
.Fn cpufreq_set
is called with a
.Dv NULL
.Fa level ,
the saved level will be restored.
If there is no saved level,
.Fn cpufreq_set
will return
.Er ENXIO .
If
.Fa priority
is lower than the current active level's priority, this method returns
.Er EPERM .
.Sh DRIVER INTERFACE
Kernel drivers offering hardware-specific CPU frequency control export
their individual settings through the
.Nm
driver interface.
This involves implementing these methods:
.Fn cpufreq_drv_settings ,
.Fn cpufreq_drv_type ,
.Fn cpufreq_drv_set ,
and
.Fn cpufreq_drv_get .
Additionally, the driver must attach a device as a child of a CPU
device so that these methods can be called by the
.Nm
framework.
.Pp
The
.Fn cpufreq_drv_settings
method returns an array of currently available settings, each of type
.Vt "struct cf_setting" .
The driver should set unknown or irrelevant values to
.Dv CPUFREQ_VAL_UNKNOWN .
All the following elements for each setting should be returned:
.Bd -literal
struct cf_setting {
	int	freq;	/* CPU clock in MHz or 100ths of a percent. */
	int	volts;	/* Voltage in mV. */
	int	power;	/* Power consumed in mW. */
	int	lat;	/* Transition latency in us. */
	device_t dev;	/* Driver providing this setting. */
};
.Ed
.Pp
On entry to this method,
.Fa count
contains the number of settings that can be returned.
On successful completion, the driver sets it to the actual number of
settings returned.
If the driver offers more settings than
.Fa count
will allow, it should return
.Er E2BIG .
.Pp
The
.Fn cpufreq_drv_type
method indicates the type of settings it offers, either
.Dv CPUFREQ_TYPE_ABSOLUTE
or
.Dv CPUFREQ_TYPE_RELATIVE .
Additionally, the driver may set the
.Dv CPUFREQ_FLAG_INFO_ONLY
flag if the settings it provides are information for other drivers only
and cannot be passed to
.Fn cpufreq_drv_set
to activate them.
.Pp
The
.Fn cpufreq_drv_set
method takes a driver setting and makes it active.
If the setting is invalid or not currently available, it should return
.Er EINVAL .
.Pp
The
.Fn cpufreq_drv_get
method returns the currently-active driver setting.
The
.Vt "struct cf_setting"
returned must be valid for passing to
.Fn cpufreq_drv_set ,
including all elements being filled out correctly.
If the driver cannot infer the current setting
(even by estimating it with
.Fn cpu_est_clockrate )
then it should set all elements to
.Dv CPUFREQ_VAL_UNKNOWN .
.Sh SEE ALSO
.Xr acpi 4 ,
.Xr est 4 ,
.Xr timecounters 4 ,
.Xr powerd 8 ,
.Xr sysctl 8
.Sh AUTHORS
.An Nate Lawson
.An Bruno Ducrot
contributed the
.Pa powernow
driver.
.Sh BUGS
The following drivers have not yet been converted to the
.Nm
interface:
.Xr longrun 4 .
.Pp
Notification of CPU and bus frequency changes is not implemented yet.
.Pp
When multiple CPUs offer frequency control, they cannot be set to different
levels and must all offer the same frequency settings.
